Phylogeographic analysis of Siraitia grosvenorii in subtropical China provides insights into the origin of cultivated monk fruit and conservation of genetic resources

Abstract Siraitia grosvenorii, an economically important plant species with high medicinal value, is endemic to subtropical China. To determine the population structure and origin of cultivated S. grosvenorii, we examined the variation in three chloroplast DNA regions (trnR‐atpA, trnH‐psbA, trnL‐trnF) and two orthologous nuclear genes (CHS and EDL2) of S. grosvenorii in 130 wild individuals (selected from 13 wild populations across its natural distribution range) and 21 cultivated individuals using a phylogeographic approach. The results showed three distinct chloroplast lineages, which were restricted to different mountain ranges, and strong plastid phylogeographic structure. Our findings suggest that S. grosvenorii likely experienced ancient range expansion and survived in multiple refuges in subtropical China during glacial periods, resulting in population fragmentation in different mountainous areas. Our results also demonstrated that wild populations in Guilin (Guangxi, China) share the same gene pool as cultivated S. grosvenorii, suggesting that current cultivars were collected directly from local wild resources, consistent with the principles of “nearby domestication.” The results of this study provide insights into improving the efficiency of S. grosvenorii breeding using a genetic approach and outline measures for the conservation of its genetic resources.


| INTRODUC TI ON
Siraitia grosvenorii, also known as monk fruit or Luohanguo, is a dioecious perennial vine belonging to the Siraitia genus of the Cucurbitaceae family ( Figure 1). The genus Siraitia comprises four species, among which only S. grosvenorii is native to China (Li, 1993).
The close relationship between S. grosvenorii and Siraitia siamensis is supported by morphological similarities and adjacent distribution in China. However, the seed of S. grosvenorii is typically 2-layered winged, which is easily distinguishable from that of S. siamensis. Siraitia grosvenorii has been cultivated for more than 100 years in China (Ding et al., 2015). Currently, S. grosvenorii is prevalent in the Yongfu, Longsheng, and Lingui counties of Guilin (Guangxi, China), and is considered an economically important crop in Guangxi. S. grosvenorii was introduced from Guangxi into other provinces in southern China. In addition to its use as Chinese traditional medicine (Chinese Pharmacopoeia Commission, 2015), S. grosvenorii is highly valued in the food industry. The fruit of S. grosvenorii contains mogrosides, which are approximately 300 times sweeter than sucrose (Kasai et al., 1989). Because mogrosides are natural and low-calorie compounds, the mogroside extract from S. grosvenorii is considered to be an ideal sugar substitute (Yan et al., 2008). Given the increasing demand for non-nutritive sweeteners from natural sources, S. grosvenorii sweeteners have been widely used in the food and beverage industries for the development of low-calorie products (Pandey & Chauhan, 2019). In 2020, S. grosvenorii was planted in an area of over 10,000 ha in Guilin, accounting for more than 85% of the global production (Lu et al., 2022). However, because of the continuous vegetative propagation of S. grosvenorii, its cultivated accessions have become more vulnerable to pests and diseases (Li et al., 2004).
Wild genetic resources allow plant breeders to select and breed for the desired characteristics, which facilitates not only the maintenance but also the improvement of agricultural productivity (Day-Rubenstein et al., 2005; Prescott-Allen & Prescott-Allen, 2013). To fully utilize the genetic resources of S. grosvenorii, a proper understanding of the population structure of its wild accessions and the origin of its cultivated accessions is greatly needed.
The area of China ranging from 22°N to 34°N and bordered by 105°E in the west to the coastline in the east is generally referred to as subtropical China (Zhao et al., 1986). This area is characterized by a complex landform, with numerous temperate forests at high elevations and extensive tropical and subtropical forests at lower altitudes (Qian & Ricklefs, 2000). S. grosvenorii is naturally distributed in subtropical China and mainly grows in the Nanling (comprising a series of five connected mountain ranges spanning from west to east, including Yuechengling, Dupangling, Mengzhuling, Qitianling, and Dayuling), Dayaoshan, Yunkai, Jiulianshan, and Wugongshan Mountains at an altitude of 400-1400 meters. Global climatic fluctuations, particularly Quaternary climate oscillations, might have profoundly shaped the distribution range and genetic structure of plants (Abbott et al., 2000;Avise, 2000). Although glaciers are not found at low elevations (<2500 m) in subtropical China (Ni et al., 2010), a cold and dry climate has developed in this region because of the intensification of winter monsoons (Shi et al., 2006;Wang et al., 2012). S. grosvenorii diverged from members of the Cucurbitaceae family approximately 40-60 million years ago (Mya) (Guo et al., 2020;Xia et al., 2018), and its current genetic structure and diversity across its distributional range might be influenced by the Quaternary climate changes. Previously, the genetic diversity of the wild populations and cultivars of S. grosvenorii was analyzed using random amplified polymorphic DNA (RAPD), inter-simple sequence repeat (ISSR), and amplified fragment length polymorphism (AFLP) markers, which revealed a high level of genetic differentiation among the wild populations (Tang, Bin, et al., 2007). Additionally, the genetic diversity of cultivars was discovered to be much lower than that of wild populations (Peng et al., 2005;Tang, Bin, et al., 2007;Zhou et al., 2005;Zhou & Tang, 2006). Based on RAPD markers, Tang, Li, et al. (2007) speculated that clonal growth plays a role in shaping the spatial genetic structure of S. grosvenorii. However, the population genetic structure of S. grosvenorii at different geographical scales and its evolutionary history, such as potential refugia and population expansion, remains unknown, mainly because the DNA markers used in previous studies offer limited utility in inferring population structure and dynamics.
In this study, we report the results of the phylogeographical analysis of S. grosvenorii in subtropical China using three chloroplast DNA (cpDNA) fragments and two orthologous nuclear genes. We aimed to (1) investigate the genetic structure of S. grosvenorii, and  Table 1. Because S. grosvenorii exhibits clonal reproduction and has a small population size, the individual plants were sampled from sites located >10 m apart Gong et al., 2016;Zhang et al., 2022). Fresh leaves were dried in silica gel for subsequent DNA extraction. G. L.
Shi 14316 (IBSC0208492!) was serve as voucher specimen for the population ZQ because they were collected in the same location.
The voucher specimens of remaining populations were deposited in the Herbarium of the Guangxi Institute of Botany (IBK), China.
Voucher specimen numbers are listed in Table 1.

| DNA extraction, PCR amplification, and DNA sequencing
Genomic DNA was extracted from the dried leaves using the modified cetyltrimethylammonium bromide (CTAB) method (Doyle & Doyle, 1987); the modifications to the CTAB method included a 2-h incubation in 65°C water bath, and two rounds of DNA extraction using chloroform-isoamyl alcohol (24: 1). Three highly polymorphic cpDNA fragments, including trnR-atpA (Dane & Lang, 2004), trnH-psbA (Sang et al., 1997), and trnL-trnF (Taberlet et al., 1991) were selected. However, the universal nuclear marker, nrITS, could not be applied to S. grosvenorii as we failed to sequence the PCR products. Therefore, two nuclear genes, chalcone synthase (CHS) and EID1-like F-box protein 2 (EDL2), were selected. CHS has been used in phylogeographic studies (Ikeda et al., 2009;Liao et al., 2010;Wang et al., 2018). The nucleotide sequence of the EDL2 gene of Momordica charantia (Cucurbitaceae) was obtained from the National Center for Biotechnology Information (NCBI; GenBank accession no. XR_002601804), and its orthologs were identified using the OrthoMCL database (http://ortho mcl.org/ortho mcl/). The nuclear genes were PCR amplified with sequence-specific primers designed using Primer3Web (http://prime r3.wi.mit.edu/). Primer details are given in Table 2. The amplification was carried out in 50-μL reactions, each containing 0.5 μL of genomic DNA, 5 μL of 10× PCR buffer (Mg 2+ plus), 4 μL of dNTP mix (2.5 μM), 0.5 μL of each primer (50 mM), and 2.5 U of ExTaq DNA polymerase (TaKaRa, Dalian, China). The amplification program was as follows: predenaturation at 94°C for 5 min, followed by 30 cycles of denaturation at 94°C for 30 s, annealing at 55 or 58°C (Table 2) for 30 s, and elongation at 72°C for 1 min, and a final extension at 72°C for 10 min. After purification of PCR products, Sanger sequencing was conducted by Sangon Biotech (Shanghai, China).
Genes with double peaks at polymorphic sites were regarded as heterozygous. Haplotypes were determined using the method described by Clark (1990) if the gene sequence contained a single heterozygous site, and through the cloning and sequencing of PCR products if the sequence contained two or more peaks. To clone the gene, DNA was amplified using high-fidelity PrimeSTAR HS DNA polymerase (TaKaRa, Dalian, China), as described above. The cloned DNA was purified and subjected to Sanger sequencing by Sangon Biotech (Shanghai, China); 10 clones were sequenced per PCR product. All haplotype sequences in this study were deposited in the GenBank database under accession numbers listed in Tables 3-6.

| Sequence analysis
To analyze cpDNA and nuclear data, DNA sequences were aligned using MUSCLE 3.8.31 (Thompson et al., 1994) in MEGA 7.0.14 (Kumar et al., 2016), and edited manually in BioEdit 7.0.1 (Hall, 1999) where necessary. Three cpDNA regions (trnR-atpA, trnH-psbA, and trnL-trnF) were amplified from 151 individuals and aligned individually, then they were concatenated using the concatenate sequence plugin in PhyloSuite 1.2.2  to define cpDNA haplotypes and to conduct further analysis. The nuclear DNA sequences of CHS and ELD2 were analyzed independently.
A phylogenetic tree of cpDNA haplotypes was constructed using the maximum likelihood (ML) method and implemented with the IQ-TREE 1.6.12 program (Nguyen et al., 2015). The sister taxon Siraitia siamensis (GenBank accession no. MK75585) was selected as the outgroup. The F81 + F model was selected as the most appropriate substitution model based on Bayesian information criterion (BIC) implemented in IQ-TREE ModelFinder (Kalyaanamoorthy et al., 2017).
Statistical support for branching patterns was estimated using 1000 bootstrap replicates.
Values of total genetic diversity (H T ) and within-population genetic diversity (H S ) were calculated using PERMUT 2.0 (Pons & Petit, 1996). To infer the phylogeographic structure of S. grosvenorii, significant differences between G ST and N ST were calculated using a permutation test in PERMUT 2.0 with 1000 permutations. Analyses of molecular variance (AMOVA) were performed using Arlequin 3.11 (Excoffier et al., 2005), and the significance level of each variance component was assessed using 1000 permutations. In addition, AMOVA analyses with three chlorotype-based genetic lineages identified in this study were also conducted. Genetic variation was quantified at three hierarchical levels: among populations, within populations, and among lineages of populations identified by three chlorotype-based genetic lineages found in this study.
Divergence times for the three chlorotype-based lineages, identified based on cpDNA data, were estimated using BEAST 1.6.1 (Drummond & Rambaut, 2007). A total of 1.0 × 10 −9 (minimum) to 3.0 × 10 −9 (maximum) substitutions per site per year was used as a range of average mutation rates, as reported for synonymous sites in the chloroplast genes of angiosperm species (Wolfe et al., 1987).
The strict molecular clock model was used as the basic model for branch rates implemented in BEAST (Zuckerkandl & Pauling, 1965), which can be calibrated by setting a substitution rate or dates of specific nodes or node sets (Drummond & Rambaut, 2007). In this study, The historical population dynamics of S. grosvenorii was determined by performing neutrality tests and estimating the mismatch distribution of 13 wild populations and three chlorotype-based genetic lineages identified in this study. Neutrality tests were performed by calculating Tajima's D (Tajima, 1989) and Fu and Li's D (Fu & Li, 1993) using DNAsp 5.0, and Fu's Fs (Fu, 1997) using Arlequin 3.11. Mismatch distribution analyses were performed using a sudden (stepwise) expansion model (Rogers & Harpending, 1992) in

| Population structure based on chlorotypes
The alignment of the combined cpDNA sequences was 1259 bp in length, and contained 32 polymorphic sites, resulting in seven chlorotypes (C1-C7; Tables 3 and 4). The different chlorotypes and their network and geographical distribution are presented in Table 7 and  Table 7). The h and π values of wild populations were 0.822 and 0.00978, respectively (Table 7). No ancestral haplotype was found in the chlorotype network. C1 was the most abundant chlorotype, and was shared by three populations (MES, BL, and JT) and cultivars.

| Population structure based on CHS and EDL2 haplotypes
The CHS sequence alignment was 1014 bp in length and contained 25 polymorphic sites (Table 5). Forty-five CHS haplotypes (H1-H45) were identified, among which twenty-five were unique to a sin-  and π (0.00422) were found in the JLS and SG populations, respectively (Table 7).
The EDL2 sequence alignment was 607 bp in length and contained 13 polymorphic sites, resulting in 15 haplotypes (E1-E15,

| Demographic history of S. grosvenorii
The values of Tajima's D, Fu and Li's D, and Fu's Fs for cpDNA haplotypes of the wild S. grosvenorii populations were positive (Table 9).
In addition, the mismatch distribution was multimodal (Figure 5a), and both the SSD and H Rag statistics were significant (Table 9) (Table 9).
When the test was performed on the three chlorotype-based genetic lineages identified in this study, the Tajima (Table 9). The SSD and H Rag statistics of the three lineages in CHS data and those of MD and YK lineages in EDL2 data were nonsignificant (p > .05) ( Table 9).
Haplotypes shared among the three lineages (e.g., haplotypes H12, E3, and E2) represent the internal nodes of the network, could be considered as ancestral, and would predate the divergence of populations. We did not find a firm recent population expansion

TA B L E 5 (Continued)
signal in our nuclear dataset. The distribution of these haplotypes across all regions could probably be attributed more to the persistence of ancestral polymorphisms than to recent gene flow and dispersal. Thus, the most likely scenario is that S. grosvenorii experienced an ancient expansion event that brought ancestral haplotypes H12 and E3 to different parts of the range, which resulted in the origin of numerous derived haplotypes in these isolated locations; for example, 25 CHS haplotypes and 6 EDL2 haplotypes, which were private to a single population. Nevertheless, the nuclear haplotypes H3, E1, and E8 occurred in MD (HZ population only) and YD lineages, and E6 was shared between the MD and YK lineages. These results suggest limited pollen dispersal among lineages within S. grosvenorii.

| Origin of cultivated S. grosvenorii and implications for improvement and conservation of genetic resources
Population genetics-based insights will enable a better understanding of the origin of the crop and will help to identify raw materials for breeding and crop improvement (Ramanatha & Hodgkin, 2002;Turner-Hissong et al., 2020). S. grosvenorii is documented to have been cultivated in Longsheng, Yongfu, and Lingui counties of Guilin for at least 140 years (Ding et al., 2015;Yang, 2004) and then was introduced from Guangxi to Guangdong, Hunan, Jiangxi, and Guizhou Provinces (Li et al., 2004). Our data revealed the presence of only one chlorotype (C1) in cultivated S. grosvenorii (Figure 2b Table 7). This unique composition of the genetic lineages shows a great potential for improving the cultivated varieties. However, during the fieldwork, we noted that wild populations of S. grosvenorii had suffered a rapid decline and were even extirpated at some distribution points, because of habitat deterioration.  GeneBank  accession  number  22  44  109  175  196  199  208  271  310  376  418 476 520 Small solid circles with "mv" indicate unsampled or extinct haplotypes. Each line between haplotypes without short black bars represents a mutational step. Short black bars between haplotypes represent multiple changes (indicated by adjacent numbers). MD, YD, and YK represent the three genetic lineages identified by the phylogenetic analysis of chlorotypes. (d) Maximum likelihood (ML) phylogenetic tree and maximum clade credibility (MCC) tree of the seven chlorotypes (C1-C7) detected in S. grosvenorii using S. siamensis as an outgroup. Numbers above the branches indicate the result of ML bootstrap (BP) analysis (no numbers correspond to supports weaker than 80% BP) and posterior probability (PP) of the MCC tree. The node age (million years ago [Mya]) of major lineages is indicated next to each node, based on the minimum and maximum rates of synonymous substitutions in cpDNA.

| CON CLUS IONS
The molecular data collected in the current study permit the first    (2022)(2023).

CO N FLI C T O F I NTER E S T S TATEM ENT
The author declares that they have no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in the NCBI Nucleotide database at https://www.ncbi.nlm.nih.